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ABSTRACT 

Selection of items for analogy tests according to the 
Rasch item probability of "goodness of fit" to the model is compared 
with three commonly used item selection criteria; item 
discrimination, item difficulty, and item-ability correlation. Word, 
picture, symbol and number analogies in multiple choice format were 
administered to several hundred college students. Analysis showed 
that Rasch item probabilities of .05 and .01 are more lenient (in 
terms of proportion of items rejected) criteria than commonly used 
criteria (item difficulty of between .10 and .80, item discrimination 
of .20, item-ability correlation of .20). Results also showed only a 
moderate amount of overlap among the four criteria, with the Rasch 
item probability and item discrimination being the most similar, and 
item difficulty and item-ability correlates being the most 
dissimilar. (Author) 
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A Comparison of the Rasch Item 
Probability with Three Common Item 
Characteristics as Criteria for Item Selection 

Howard E. A. Tinsley and Rene' V. Dawis 

Rasch (1960, 1966) proposed a simple logistic model for ability and 
achievement tests involving two parameters -- a person parameter pertaining 
to the person's ability, ar.d au item parameter pertaining to the difficulty 
of the measurement. Ranch's model allows the separation of, and indepen- 
dent estimation of, these two parameters. Since the item parameter can 
be estimated in a manner that does not depend on the ability level of the 
sample of persons used in the estimation, Rasch* s procedure has been 
characterized as sample- free (Wright and Panchapakesan , 1969). 

As described by Wright and Panchapake3an (1969) the Rasch procedure 
consists of two stages, item calibration and person measurement. Item 
calibration consists of estimating the item parameters and their standard 
errors from the responses of a large sample of persons to the set of items. 
Items which do not satisfy the criterion of "fit" to the model are elim- 
inated. The remaining "good-fitting" items are then used to obtain test 
scores for the persons in the sample. From these scores and the diffi- 
culties (or, conversely, easinesses) of the items used, an estimate of 
each person 1 8 ability and the standard error of this estimate are obtained. 

The -present study concerns the selection of items for analogy tests 
according to the Rasch procedure's "goodness-of-fit" test, and how this 
selection compares with item selection based on three commonly used item 
characteristics, namely, item discrimination, item difficulty and item- 
ability correlation. 



Method 



Instruments — Five analogy tests were utilized in th« study: one form 

each for word, picture and symbol analogies and two number analogy forms. 
There were 94 items in the word analogy test, 32 items in the symbol analogy 
test, 99 items in the picture analogy test, and 173 number-analogy items, 

93 in one form and 35 in the other. All items were of the multiple choice 
type, with five response alternatives and with the blank in the item stem 
occurring in any of the four positions of the analogy elements (i.e., in 
A, B, C or D position in the analogy A:B::C:D). All tests were introduced 
by one standard page of test instructions. 

Subjects — The subjects in the study were college students enrolled 
in an introductory psychology class At the University of Minnesota during 
the fall of 1970. All subjects were volunteers (obtained through the subject 
pool of the Department of Psychology) who were participating in the research 
to gain additional points toward their course grade. Each student completed 
one, two or three tests. A total of 1,400 tests were completed, including. 
304 word analogy tests, 319 picture analogy tests, 301 symbol analogy 'tests, 
and 268 of one form and 203 of the other form of the number analogy test-. 

Admini stration — Because the test forms were designed to be self- 
explanatory, subjects were simply given the test, instructed to read the 
directions and complete the test. The test administrator was always avail- 
able, however, to answer any questions. Each subject was allowed to complete 
one, two, or three tests. Tests were administered' in the following order: 

1) word, 2) picture, 3) symbol, 4) number, form 1, and 5) number, form 2. 

No time limits were set for completion of the tests. 

Analysis — Item analysis was performed using the Bart et al. (1970) 
adaptation of the Wright and Panchapakesan (1970) computer, program. This 
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program outputs, for each item, the item difficulty (proportion of correct 
responses), the Rasch item easiness estimate and its error term, the item- 
ability correlation, the item discrimination, and the Rasch probability 
value for the "goodness of fit" test. Of interest to this study are item 
difficulty, item-ability correlation, item discrimination and the Rasch 
item probability (of "fit" to the model). 

The Rasch item probability is the probability of the observed response 
pattern given the hypothesis that the item fits the Rasch simple logistic 
model. According to Wright (1970) the problem of item fit is "not simple". 
The P- value is the probability of a chi square value derived by summing 
squared normal deviate values across score groups (with df * number of score 
groups minus 1). The normal deviates values, in turn, are normal deviate 
transformations of "proportion correct" values for each score group. Thus, . 
a normal deviate of 2 or less is considered acceptable while values greater 
than 3 are unacceptable. With acceptable normal deviates, the P-value of 
the resulting chi square can range below .001, hence cut-off points of .05 
(as recommended by Brooks, 1964) or even of .01 (as recommended by Anderson, 
et al. , 1963) may be overly stringent. The number of persons in each score 
group is enother factor, since a .misfit based on a small (less than 10) 
group is less significant thaa one based on a large (greater than 20) group. 
Nonetheless, for this study, Rasch item probability cutoffs of .01, .05, 

.10, .23, .50, and .75 were specified as the minimum acceptable criterion 
values. 

Using item difficulty as a selection criterion is justified on the 
grounds that item variance is a function of item difficulty, and it is 
desirable to select the items with the largest variances since test variance 
is a function of the summed item variances (Lord and Novick, 1960, Ch. 15). 
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Item variance is at its greatest for item difficulty at p = .5 and decreases 
as p deviates from .5* Three criterion levels for item selection were used 
in this study: .20 6 p k .30, .30 6 P & .70, and .40 4 p k .60. 

Item-ability correlation is the point biserial correlation of item 
scores with ability scores and is an index of item validity (Lord and Novick, 
1963, Ch. 15). For this study, two levels were used as criteria for item 
selection: r pb ^ .20 and r pb & .30. 

Item discrimination is an index derived from the biserial correlation 
between latent ability and scores on the item according to the formula (Lord 
and Novick, 1968, p. 373). 



Birnbaum (in Lord and Novick, 1963, p. 474) states that .93 and .20 represent 
the extremes of the range of item discrimination values encountered in practice. 
Three levels of item discrimination values were used in this study as criteria 
for item selection: d ^ .20, 2* . 30 , and .40. 

The above criteria for item selection were compared with respect to the 
percentage of items in each test that met each criterion (i.e., each level 
of each type of criterion). Since, in practice, item selection is usually 
based on more than one criterion, the percentage of items meeting two 
criteria was examined for every pair of criteria. Of major interest was the 
percentage of those items meeting the Rasch item probability criteria which 
also met other criteria. 




where d = discrimination index 



r b = biserial correlation between item score 
and latent ability. 



Results 



Table 1 shows the percentages of items for each type of analogy test 
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that satisfied the various criteria. As might be expected, the proportion 
°f items selected depended on both the type of criterion and the type of 
analogy test. The largest percentages of items were consistently selected 
for number analogies when the Rasch item probability, item discrimination, 
or item-ability correlation, was employed as the selection criterion. For 
item difficulty as the criterion, the highest selection rate was observed 
for word analogies. The lowest percentages of items selected were con- 
sistently for picture analogies when the Rasch item probability or item- 
ability correlation was used as the criterion. The lowest selection rates 
tended to be for symbol analogies when using item discrimination as the 
criterion, and for number analogies when using item difficulty as the 
criterion. A Rasch item probability of .01 was the most lenient criterion 
among those tried out in this study, regardless of type of analogy item. 

In terms of percentages of items selected, an item discrimination level of 
.20 was approximately equivalent to a Rasch item probability of .05, while 
an item difficulty range of .20 to .80 was approximately equivalent to a 
Rasch item probability of .25. This was generally true for the four 
different types of analogy items. However, an item-ability correlation of 
.20 was approximately equivalent to a Rasch item probability of .25 for word 
analogies, .35 for symbol and picture analogies, and .10 for number analogies 

Insert Table 1 here 

Tables 2, 3, 4, and 5 show the percentages of items satisfying both of 
each pair of criteria, for word, symbol, picture, and number analogies, 
respectively. Table 2 shows that for word analogies, and for the Rasch item 
probability paired with other criteria, the most lenient levels used in this 
study selected between 56% and 78%. For other pairs of criteria, the 

s 
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selection rates for the most lenient levels ranged between 33% and 59%. 

Similar percentage rates were observed for symbol analogies (Table 3) and 
picture analogies (Table 4). For number analogies, as shown in Table 5, 
the percentage rates for the pairings of Rasch item probability with other 
criteria, both at the most lenient levels, ranged from 40% to 90%. Pairings 
of the other criteria resulted in selection rates at the most lenient levels 
that ranged from 32% to 80%. These percentage rates for paired criteria 
provide some idea of the overlap (or lack of overlap) between the criteria. 

In terms of percentage of items selected in common, the Rasch item probability 
and item discrimination tended to be most similar, while item difficulty and 
item-ability correlation tended to be most dissimilar. In both instances, 
the supportive results were uniformly found across all four types of analogy 
items. 

Insert Tables 2, 3, 4, and 5 here 
Conclusion 

In this study, the Rasch item probability (an index of ,: goodness of fit' 5 
of the item to the Rasch simple logistic model) was compared with three other 
item characteristics -- item difficulty, item-ability correlation, and item 
discrimination — as criteria for item selection. The results of this study 
show that Rasch item probability levels of .01 and .05 proposed as criteria 
for item selection are more lenient (in terms of proportion of items rejected) 
than commonly used levels of the other item characteristics (to wit, item 
difficulty of between .20 and .80, and item-ability correlation and item 
discrimination of .20 or greater) . This finding was true for all four types 
of analogy items used: word, symbol, picture and number. The results also 

showed only a moderate amount of overlap among the four criteria, with the 
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Rasch item probability and item discrimination being the most similar, and 
item difficulty and item-ability correlation being the most dissimilar, 
criteria for item selection. 
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Table 1 



Percentage of Items Satisfying Four Item 
Selection Criteria, by Type of Analogy Test 
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Type of Analogy Test 




Criterion 


Word 


Symbol 


Picture 


Number 




94 Items 


32 Items 


99 Items 


178 Items 


Rasch Item 
Probability 
.01 


94 


95 


92 


96 


.05 


86 


84 


83 


88 


.10 


78 


78 


75 


82 


.25 


60 


61 


52 


66 


.50 


32 


33 


27 


48 


.75 


11 


13 


10 


27 


Discrimination 

.20 


81 


70 


76 


92 


.30 


62 


50 


58 


83 


.40 


50 


44 


29 


74 


Difficulty 
.20 - .30 


74 


56 


65 


42 


.30 - .70 


59 


34 


43 


30 


.40 - .60 


36 


20 


22 


14 


Item-Ability 

Correlation 


• 








.20 


56 


45 


37 


80 


.30 


23 


11 


07 


53 




l 
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Table 2 

Percentage of 94 Word Analogy Items 
Satisfying Pairs of Item Selection Criteria 



Item-Ability 

Criterion Discrimination Difficulty Correlation 





.20 


.30 


.40 


0 

CO 

• 

1 

o 

Csl 

• 


.30-. 70 


.40-. 60 


.20 


.30 


Rasch Item 


















Probability 


















.01 


73 


62 


49 


69 


52 


30 


56 


22 


.05 


73 


60 


47 


65 


50 


29 


53 


21 


.10 


66 


53 


41 


59 


45 


27 


43 


20 


.25 


50 


40 


31 


47 


35 


19 


36 


15 


.50 


26 


20 


16 


26 


13 


08 


10 


09 


.75 


09 


06 


05 


07 


04 


01 


06 


03 


Item-Ability 


















Correlation 


















.20 


56 


56 


50 


33 


23 


1C 






.30 


23 


23 


23 


18 


13 


07 






Difficulty 


















-20 - .80 


59 


47 


33 












.30 - .70 


46 


34 


28 












.40 - .60 


27 


19 


15 
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Table 3 

Percentage of 82 Symbol Analogy Items Satisfying 
Pairs of Item Selection Criteria 



Criterion 


Discrimination 
.20 .30 .40 


.20-. 3C 


Difficulty 
1 .30-. 70 


.40-. 60 


Item-Ability 
Correlation 
.20 .30 


Rasch Item 


















Probability 


















.01 


70 


50 


44 


56 


37 


18 


45 


11 


.05 


63 


49 


44 


49 


33 


18 


45 


11 


.10 


61 


43 


43 


46 


32 


18 


44 


10 


.25 


51 


41 


38 


34 


24 


17 


39 


09 


.50 


27 


21 


18 


13 


15 


11 


20 


04 


.75 


11 


07 


06 


07 


05 


04 


07 


00 


Item-Ability 










- 








Correlation 


















.20 


45 


45 


44 


34 


28 


15 






.30 


11 


11 


11 


07 


06 


05 






Difficulty 


















.20 - .80 


43 


34 


32 












.30 - .70 


33 


28 


26 












.40 - .60 


17 


16 


16 
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Table 4 

Percentage of 99 Picture Analogy Items Satisfying 
Pairs of Item Selection Criteria 



Item- Ability 

Criterion Discrimination Difficulty Correlation 

.20 .30 .40 . 2Q-. 80 .30-. 70 .40-. 60 .20 .30 



Rasch Item 
Probability 



.01 


76 


53 


28 


62 


40 


20 


46 


07 


.05 


69 


53 


25 


54 


35 


17 


42 


04 


.10 


62 


50 


23 


48 


30 


14 


40 


04 


.25 


43 


36 


13 


35 


21 


08 


27 


03 


.50 


21 


13 


06 


17 


11 


05 


12 


00 


.75 


08 


06 


02 


06 


04 


03 


03 


00 


Item-Ability 


















Correlation 


















.20 


47 


47 


27 


34 


21 


09 






.30 


07 


07 


07 


06 


04 


02 






Difficulty 


















.20 - .30 


53 


40 


20 












.30 - .70 


34 


25 


12 












.40 - .60 


15 


10 


05 
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Table 5 



Percentage of 170 Number Analogy Items Satisfying 
Pairs of Item Selection Criteria 



Criterion 


Discrimination 
.20 .30 .40 


Difficulty 
.20-. 80 .30-. 70 


.40-. 60 


Item-Ability 
Correlation 
.20 .30 


Rasch Item 


















Probability 


















.01 


90 


32 


73 


40 


29 


13 


79 


53 


.05 


83 


76 


68 


36 


27 


13 


74 


51 


.10 


78 


73 


64 


34 


26 


12 


71 


43 


.25 


63 


60 


52 


25 


10 


09 


53 


42 


.50 


46 


44 


41 


14 


09 


05 


43 


33 


.75 


26 


25 


24 


08 


06 


03 


25 


13 


Item- Ability 


















Correlation 


















.20 


80 


30 


74 


32 


24 


11 






.30 


53 


53 


53 


18 


14 


07 






Difficulty 


















.20 - .80 


39 


30 


29 












.30 - .70 


29 


25 


21 












.40 - .60 


12 


11 


10 


- 
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